home *** CD-ROM | disk | FTP | other *** search
Text File | 1996-05-10 | 4.7 KB | 74 lines | [TEXT/R*ch] |
- Postman’s Sort
- Everyone knows that sorting algorithms require execution time that grows faster
- than linearly with the number of input records. Specifically, sorting N input
- records is known to require somewhere between O(Nlog2N) and O(N2) comparisons,
- depending on the specific algorithm and whether one is considering average or
- worst-case time, right? Wrong. Your Challenge is to code a sorting algorithm
- that requires time proportional to the number of input records.
- The bounds in the previous paragraph apply to sorting algorithms that are based
- on pairwise comparison of input keys. A distribution sort, however, requires no
- key comparisons, and has different performance characteristics. Imagine, for
- example, how the post office might sort mail. Initially, the mail might be
- distributed into piles by country. Each of those piles might be distributed
- into other piles by state. Those piles might be distributed by city, and then
- by street, and finally by address. All of this could be accomplished by making
- a sequence of passes through the input without ever comparing the sort keys of
- two input records to one another. This month, you are to implement a
- distribution (or “postman’s”) sort algorithm. The prototype for the code you
- should write is:
-
- #include <stdio.h>
- static pascal OSErr PostmansSort(
- FILE *inFile, /* stream containing sort input */
- FILE *outFile, /* stream to contain sort output */
- void *privateStorage, /* preallocated storage for your use */
- size_t storageSize /* number of bytes of privateStorage */
- );
-
- The input will consist of records in the following format:
-
- [FirstName] <tab> [MiddleName] <tab> LastName <tab>
- StreetAddress <tab> StreetName <tab>
- City <tab> State <tab> [ZipCode] <tab> [Country] <CR>
-
- Input records should be read from inFile, sorted, and written to outFile. Both
- the input and output files will be opened and closed for you by the calling
- routine. Records will consist of up to 8 fields, as indicated above, and be
- terminated by a carriage return (0x0D). Fields will be separated by tabs (0x09).
- The square brackets in the format description are meta-characters indicating
- optional fields; the brackets will not be present in the input. Fields may
- contain embedded spaces or special characters (except tabs and returns).
- Records are to be sorted into ascending order with Country as the primary sort
- key, ZipCode (if present) as the secondary key, then StreetName, StreetAddress,
- LastName, FirstName, and finally MiddleName, in that order. If ZipCode is not
- present, then State and City should replace it as the secondary and tertiary
- sort keys, respectively; otherwise State and City should be ignored. Sort order
- should be lexicographic except when a field value is purely numeric, and a
- purely numeric field should be treated as smaller than a field containing
- non-numeric characters. That is, field values “1”, “2”, “10”, “1B”, “1C”, and
- “2A” would sort in the order indicated. Empty optional fields should be
- considered to have the smallest possible value (e.g., records with no Country
- should be output before records with a Country value). Your routine should
- return zero (noErr) if it completes normally, or a non-zero error code if it is
- unable to complete the sort for any reason.
- There are no predetermined limits on the number of distinct countries, State,
- City, etc. values that might be present in the input. Your routine will be
- provided with storageSize bytes (at least 64KB) of preallocated storage,
- initialized to zero by the calling routine, and pointed to by privateStorage.
- You may not allocate any additional memory, although use of small amounts of
- static memory is permitted. If your solution requires additional storage, you
- should create and write to temporary disk files. Adequate disk space is
- guaranteed. In approximately half of the test cases, you will be guaranteed at
- least 32 bytes of memory per record, plus 32 bytes for each unique field value.
- Scoring will weight these high-memory cases equally with the cases where less
- memory is provided.
- This will be a native PowerPC Challenge, scored using the Metrowerks
- environment. Solutions may be coded in C, C++, or Pascal. If you use a
- language other than C, you should ensure that your code can be called from a
- test driver written in C. Most importantly, to be considered correct, your code
- must sort the input into the proper order without performing any key comparisons
- between input records. The fastest correct solution will be the winner.
- This Challenge is based on a suggestion from Peter Shank, passed on to me by
- Mike Scanlin. Peter forwarded a reference to an article on the subject by
- Robert Ramsey, which you can find at http://www.silcom.com/~ramey/article.html.
-